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We thank all the discussants for their contributions and in particular we 
wish to thank Hampel. The concept of breakdown point goes back to his 
Ph.D. thesis [Hampel (1968)] and he was the first to exhibit a high break- 
down equivariant regression estimate now known as the least median of 
squares [Hampel (1975)], a fact which is sometimes forgotten. These two 
sources are the starting point of the present discussion. In his contribution 
Hampel gives us insight into the thoughts which led to his definition of break- 
down point, intended as it was to complement the infinitesimal behavior of 
a functional as described by the influence function. Hampel emphasizes that 
equivariance considerations were not part of his definition and he had in 
mind correlation statistics "where there is no equivariance at all." He con- 
siders correlation in some detail and, as we disagree with him on this very 
topic, we give a detailed analysis of correlation statistics in our rejoinder. 
We hope that this will help clarify the issues involved. 

1. On breakdown. The first signification of the word "breakdown" given 
in the Oxford Dictionary starts with the following subsignification: 

"1. a. The act of breaking and falling down: a ruinous downfall, a collapse." 

2. Breakdown to points and variations. Genton and Lucas and Oja ar- 
gue for the usefulness of the breakdown concept in situations not covered 
by the results of our paper. They claim that at least in an intuitive sense 
breakdown occurs if the value of a functional is driven to the boundary or to 
an interior point which is independent of the uncontaminated sample. A for- 
mal definition of breakdown point is given which is intended to cover such 
possibilities. A first version is to be found in Genton and Lucas (2003) and is 
referred to by Oja. It defines the breakdown point as the smallest amount of 
contamination which can cause the statistic to assume only a finite number 
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of values independently of the uncontaminated observations. On the basis of 
this definition the arithmetic mean is claimed to have a finite-sample break- 
down point of 1/n. The argument is as follows: if the first observation of 
the sample is contaminated, {Ci,y2, ■ ■ ■ ,yn), and we let tend to infinity, 
then the sample mean tends to infinity, that is, to a single value which is 
independent of y2, ■ ■ ■ ,yn- However, for any finite value of the arithmetic 
mean takes on a continuum of values on varying the uncontaminated part of 
the sample. The only way of reducing the arithmetic mean to a single value 
is to introduce the symbol oo as a possible value for the contamination. The 
symbol oo is thus elevated to a real entity for data. The new definition avoids 
this but our reaction is similar: any definition of breakdown based on the 
concept of Lebesgue measure zero must be at fault. According to the new 
definition the functional 

T{Pn) = max{-n,min{7i,rLs(-Pn)}} 
has a breakdown point of 1/n. We perturb it and put 

T*{Pn) = T{Pn) + - [ sin(x) dPn{x). 
n J 

The set of values taken on by T*{Pn) as we vary the uncontaminated part of 
the sample has Lebesgue measure at least as long as not all the sample 
is contaminated, and the breakdown point is therefore 1. As the perturbation 
tends with 1/n to zero, T* remains consistent and asymptotically normal at 
the model. Oja mentions the classical skewness statistic 

{{l/n)Uxi-xf? 
' ((l/n)E(x.-x)2)3' 

but this can be treated in the same manner by putting 

bl=bi + sin(n6i), 

which is still invariant but does not converge. 

A second criticism we made of the definition of Genton and Lucas (2003) 
is that any realizable functional immediately breaks down for the simple rea- 
son that it can only take on a finite number of values; all data and statistics 
are of finite precision. Genton and Lucas mention this in their contribution 
as a weakness of the new definition and so it is. No reasonable definition 
of breakdown can rely on the myth of a continuum of possible values for a 
statistic or the associated myth of infinite precision. When applying mathe- 
matics to applied problems it is important that the discrete problem can be 
well approximated by the continuous one. Genton and Lucas' use of infinite 
precision and a continuum of values and sets of Lebesgue measure zero is not 
of this sort. Their continuous formulations do not approximate the discrete 
world of statistics. 
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We point out further that the definitions of Genton and Lucas, and also of 
Oja (Definition 4), represent a complete break with the meaning of break- 
down as it is used in statistics. Transferred to the statistical context the 
"ruinous downfall" of Section 1 is expressed in terms of distances and ar- 
bitrarily large bias. None of this is present in a concept of breakdown in 
terms of the number or Lebesgue measure of the set of all possible limits 
of contaminated samples. No mention is made of bias, that is, how far the 
value of the statistic can move from its value at the uncontaminated sample 
for a given amount of contamination. Yet it is this which has motivated 
robust statistics from the influence function via bias to breakdown point. 
In a sense the proposal put forward by Genton and Lucas is the very op- 
posite of this. Rather than moving arbitrarily far, the statistic has broken 
down if it does not move at all. It is said that it then cannot convey any 
information in the sample. Even this is not always the case. Consider the 
statistical functional T75 which takes on the fixed value of 75 for all data 
sets. This has a breakdown point of according to the Genton-Lucas def- 
inition. German insurance companies are required to use life expectancies 
specified by law. In the case of a male they could, for example, be forced to 
use the functional T75 to estimate life expectancy in years. The effect can 
be felt but it is not a ruinous downfall. It would be a ruinous downfall for 
the German insurance companies if they had to use a value of 65 and the 
reason is that 65 differs from the experienced lengths of life much more than 
does the fixed value of 75. Here as in the usual definition of breakdown it is 
the discrepancy which is important. 

3. Perturbations. The criticism we gave of Genton and Lucas's defini- 
tions of breakdown has wider implications. We regard robust statistics as a 
perturbation theory for statistics. In particular, robust statistics must con- 
cern itself with perturbations of models and data sets and, in consequence, 
it must be able to deal with finite precision. The perturbations involved 
should be realistic ones and this will in general exclude perturbations de- 
scribed by the gross error neighborhood, which is simply too small. Unfortu- 
nately, the idea of stability under perturbations is sometimes lost, especially 
in theoretical work. Suppose a theorem on the existence and uniqueness of a 
functional requires assumptions about the existence and differentiability of a 
density function. These assumptions should then not be referred to as "under 
weak assumptions" but rather as "under very restrictive assumptions which 
violate the spirit of robustness." Densities disappear under perturbations, 
likelihood disappears under perturbations as does the property of being a 
Lebesgue set of measure zero, efficiency is pathologically discontinuous, and 
so on. Perturbations and their consequences should be taken seriously by all 
who work in the area of robust statistics. 
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4. Afiine equivariant location functionals. The example of location func- 
tionals makes use of only the translation group although it seems natural 
to require affine equivariance. The problem is that for the affine group we 
have Gi = since if we iterate A{9) = A(9) + b this will in general not tend 
to infinity so Theorem 3.1 is not applicable. The highest breakdown point 
for translation equivariant functionals is 1/2 but there are affine equivari- 
ant location functionals which are based on scatter functionals and which 
have a breakdown point of at least that of the scatter functional, namely 
(1 — A(P))/2. The gap has not been closed but Rousseeuw gives a sufficient 
condition for the bound (1 — A(P))/2 to hold. His argument makes use of 
the convex hull which can be seen as a form of scatter functional albeit with 
a low breakdown point. In Davies and Gather (2002) we showed that the 
bound 1/2 is attainable at least at some empirical measures so that the gap 
remains. 

5. Metrics on 7'. We agree with Hampel's comments on the gross error 
neighborhood but we do not like either of the alternatives he suggests. First, 
total variation is not much better than the gross error neighborhood; a 
distribution Q lies in the e total variation neighborhood of P if and only 
if Q — P = e{Hi — P) — e{H2 — P) for some distributions Hi and H2 [see 
Rieder (2000), page 7]. Second, the Prohorov metric defined by 

(5.1) dpriP, Q) = inf{e > : P{A) < Q{A') + e}, 
where 

(5.2) A"" = {x:d{x,A) <e}, 

conflates the last e of (5.1) where it operates as a dimensionless probability 
with the e of (5.2) where it represents a rounding error. We refer to Davies 
(1993) for a discussion of this point. Other simpler metrics are also capable 
of dealing with rounding errors. The Kolmogorov metric is defined by 

(5.3) 4o(^',Q) = sup{|P(/) - Q(/)| :/= (-oo,x],x GM}. 

Let Pn be the empirical distribution of some data and P* be the empirical 
distribution of the same data after rounding. If the rounding 6 is less than the 
minimum gap between the unrounded observations, then d]io{Pn, Pn) = 1/ra 
assuming at least one observation to have been altered. It is sometimes 
argued that d^o is too weak in the data analytical sense for comparing dis- 
tributions. There are stronger versions which go under the name of Kuiper 
metrics. The Kuiper metric of order k is defined by 

dku,k{P, Q) = sup|y^ \P{^j) ~ Q{^j)\ - h, - ■ ■ ,Ik disjoint finite intervals j 

(5.4) 
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Kuiper metrics of order k = 19 are used in Davies and Kovac (2004) in 
the context of providing approximate densities for data. The Kolmogorov 
and Kuiper metrics are restricted to M, but in higher dimensions metrics 
on Vapnik-Cervonenkis classes of sets retain many of their properties [see 
Pollard (1984)]. We refer to Davies (1993) for their use in the regression 
setting. The conflation of measurement error and probability in (5.1) can be 
avoided as follows. We define 

(5.5) dpk(P, Q) = inf{e > : P{I) < Q{P) + e, for all intervals /}, 

where P denotes the interval with the same center as / but with length 
|/|exp(e). All occurrences of e in (5.5) are now dimensionless. The idea is 
not new. We refer to Davies (1992, 1993). 

Hampel's second argument for the Prohorov metric is that it metricizes 
weak convergence but we fail to see the relevance of this. The Kolmogorov 
metric (5.3) does not metricize weak convergence but nevertheless does have 
advantages over the Prohorov metric for proving central limit theorems. In 
particular we have 

(5.6) dUPn,P) = Opil/V^) 

uniformly in P. If T is a functional with a bounded influence function 
I{x,T,P), then under appropriate regularity conditions 

(5.7) T{Pn)-T{P)= J I{x,T,P)d{Pn{x)-P{x))+Op{dUPn,P)), 

which in the light of (5.6) gives us a central limit theorem for y/n(T{Pn) — 
T(P)). The same reasoning fails for the Prohorov metric because (5.6) does 
not hold [see Kersting (1978)]. 

6. Metrics on 0. We turn to the metric D on Q which quantifies the 
"ruinous downfall." For location in M the choice D{6i,92) = \9i — 02\ seems 
natural but the choice | log(0i/^2)| for scale is not quite as obvious. It does, 
however, have a strong justification in that numbers often have to be stan- 
dardized by division by scale. If so, a scale of zero is a "ruinous downfall." In 
higher dimensions breakdown in scale includes the data being concentrated 
on a lower-dimensional hyperplane, making it impossible to identify the in- 
fluence of individual covariables. Again the word breakdown would seem 
appropriate. In an earlier version of our paper we considered the possibility 
of measuring differences in the parameter 6 by differences in the correspond- 
ing distributions Pq as in D{9i,62) = d{Pg^,P0^) for an appropriate metric d 
on the space of distributions, but this needs to be given more thought. 

Tyler has pointed out that if the parameter space is compact, then the 
metric is bounded so that condition (3.1) of our paper cannot possibly be 
satisfied. This is true, but just as metrics on V are chosen for the problem. 
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SO we can choose metrics on according to the problem. If breakdown is 
defined in terms of convergence to some parameter values such as those on 
the boundary, then we can choose an appropriate metric as follows. We start 
by considering the problem of scale in M. The proof works by showing that 
if e > (1 — A(P))/2, then there exists an affine transformation A{x) = ax + h 
with |a| 7^ 1 and, for any n, distributions Qin and Q2n satisfying 

d(P,Qi„)<e, a!(P,Q2n)<e, = |arT(Q2n). 

From this it follows that either 

liminf(min(r((5i„),r((52n))) = or limsup(max(r(Qi„,), r(Q2n))) = oo. 

Using this fact we can define the breakdown point by 

e* (T, P, d, {0, oo}) = inf {e > : inf [r(Q) : (i(Q, P) < e] = 

(6.8) 

or sup[r(Q) : d{Q, P)<e\ = oo]. 

This definition makes no reference to a metric but two points on the bound- 
ary of the parameter space, and oo, play a special role. The metric we 
use in this case is Z)(6'i, 6*2) = | log(6'i/02)| and, not surprisingly, the points 
and 00 also play a special role here. The result is that e*(T, P, d, {0, 00}) = 
e*{T, P,d, D). We see that breakdown as defined by (6.8) can be reformu- 
lated in terms of an appropriately chosen metric on the parameter space 0. 
This remains true even if is compact. Suppose is equipped with a metric 
D*, bounded or not, and that some parameter value ^0 is regarded as break- 
down, for example, in the scale context or 1 in the correlation context. We 
define the metric D on by 

(6.9) De,{ei,e2) ^ ^ 



It follows that if we keep 61 constant, then Dg^^{9i,92) tends to infinity if 
and only if 62 tends to ^o- If we define in analogy to (6.8) 

(6.10) e*(r,P,d,{^o}) =mf{e>O:inf[P>*(0o,T(Q)):d(Q,P) <e] =0}, 

then clearly e*{T, P, d, {^o}) = P, d, Dq^). If there is a set of parameter 
values ©0 which are regarded as breakdown, for example, the boundary 
points, we define 

(6.11) Deo{0i,92) =snp{De,{9i,e2):9o G 0o} 

and again we have a metric which can be used to define breakdown. We 
define 

e*(T,P,d,0o) 

(6.12) 

= inf{e > : inf{inf [^^(eo, T{Q)) : d{Q, P) < e] : G ©o} = 0} 
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and it follows that e*(T, P, d, Qq) = e*(T, P, d, Dq^) and also 

(6.13) e*(r, P, d, @o) = mf{e*(r, P, d, {9o}) : G 60}. 

Grize (1978), as we shall see below, defines breakdown as the minimum 
contamination such that all points in ©o reachable and not just some 
such point. This can be accommodated by defining 

e**{T,P,d,eo) 

(6.14) 

= inf{e>O:inf{0G eo:sup[D0o(T(P),r(Q)):d(P,Q) <e] =00}}. 

In contrast to (6.13) this definition results in 

(6.15) e**{T, P, d, @o) = sup{e*(r, P, d, {60}) : ^0 G So}. 

There are no doubt other variations. The conclusion is that if breakdown 
is defined as convergence to some set of exceptional parameter values, then 
this can be described by a metric as required in our theorem. It still leaves 
open the question as to whether such a definition of breakdown is sensible 
but this can only be answered on a case-to-case basis. 

7. Breakdown point? Rousseeuw in his contribution argues for the use 
of breakdown "value" rather than "point." We do not quite understand his 
reasoning and while usage is never absolute, we do not see any advantage in 
replacing "point" by "value." Hampel mentions the analysis of variance as 
one situation where the term breakdown point may not be appropriate. In 
the simple two-way table breakdown occurs if the majority of observations 
in any one row or column are badly contaminated, but this is too pessimistic 
and gives an artificially low breakdown point. In Terbeck and Davies (1998) 
the breakdown or interaction patterns for the two-way table are character- 
ized and it is shown how these are related to the Li-solution and to Tukey's 
median polish. Other articles concerned with patterns are Ellis and Morgen- 
thaler (1992) for Li regression and Kuhnt (2000) for contingency tables. 

8. Afiine equivariance. We agree with Hampel that affine equivariance 
is not always a requirement and in two or more dimensions it is more diffi- 
cult to justify than in one. We made comments to this effect in our paper. 
Nevertheless it is not always the case that outliers are apparent in the single 
coordinates and to find these some sort of equivariance would seem to be 
required. An example of a simple data set for which it is not sufficient just to 
look at the coordinates is given on page 57 of Rousseeuw and Leroy (1987). 
Programs based on high breakdown methods are now readily available and 
in our opinion should be used in a routine manner [Becker and Gather (1999, 
2001), Rocke (1996) and Rousseeuw and van Driessen (1999)]. The costs are 
negligible and the returns can be substantial. 
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Fig. 1. Samples differing from the initial sample (upper left) by 3 points (upper right), 
9 points (lower left) and 14 points (lower right) with the rank- correlation changing from 
-0.332 to 0.278, 0.878 and 1, respectively. 

9. Correlation. This brings us to the perhaps most important part of the 
discussion. Hampel argues strongly that correlation provides an example of 
a useful concept of breakdown which does not have an equivariance struc- 
ture. We argue that he is wrong on both counts: the concept is not useful and 
it does have an equivariance structure, albeit a simple one. We give a detailed 
reply which touches on many of the points discussed so far. Grize (1978) gives 
two definitions of breakdown for a rank correlation functional Tj-c- The first 
reads [see (6.14), (6.15)] 

e**{T,„P, d, {-1, 1}) = inf{e > : svip{T,,{Q) : d{P, Q) < e} = 1, 

(9.16) 

mi{T,,{Q):d{P,Q)<e} = -l} 

and the second reads 

e**{\Z,l P, d, {0, 1}) = mi{e > : sup{|r,c(Q)| : d{P, Q) < e} = 1, 

(9.17) 

mf{\TM\:d{P,Q)<e}=0} 

for some appropriate metric d. For the total variation metric Grize calculates 
the breakdown points of Kendall's and Spearman's rank correlation for some 
particular distributions. We carry out a small experiment for Spearman's 
functional Tsc- The top left panel of Figure 1 shows 20 data points with 
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Fig. 2. The upper left panel shows a distribution considered by Grize (1978). The upper 
right panel shows the same data after a monotone transformation. The bottom left panel 
shows the breakdown [in the sense of (9.17)] of Spearman^ s rank correlation to zero. The 
bottom right panel shows the breakdown of Spearman^ s rank correlation to 1. 



Tsc{Pn) = —0.332. Initially there are various sets of six points for which 
yi = h[xi) with h a nondecreasing function. We choose one and then move 
one of the remaining points at a time until finally after 14 moves all the 
points satisfy Ui = h{xi) with h nondecreasing. For the final sample the rank 
correlation is 1 and we have, according to (9.16), breakdown. The top right 
panel of Figure 1 shows the sample after three moves, the bottom left after 
nine moves. The final sample is shown in the bottom right panel. At no stage 
do we experience a breakdown. Each sample differs only slightly from the 
previous one and the values of the rank correlation are perfectly reasonable 
for the sample they refer to. In Hampel's terminology there is no pole. In 
a similar vein. Figure 2 shows a distribution considered by Grize for which 
he calculates the breakdown point 0.1 of Spearman's rank correlation in the 
sense of (9.17). The top left panel shows the initial distribution and the top 
right panel the same data after a monotone transformation. A breakdown 
to zero is shown in the bottom left panel and to 1 in the bottom right 
panel. In our opinion the bottom right panel is the only one where one 
would not a priori question any observation and yet this is classified as 
breakdown. We now play a similar game in one dimension and consider a 
simple standard normal sample of size 20. We consider the median and as 
breakdown corresponds to an arbitrarily large value of the median we start 
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with 100. The game is now to aher the initial sample point by point until 
after ten moves the value of the median is at least 100. The moves are 
almost prescribed. We choose any observation from the original sample and 
move it about 200 units to the right. After ten moves the median assumes 
a value of about 100. There is no other strategy. Even the first move alters 
the sample in a manner which distinguishes it immediately from the initial 
sample. Furthermore, when we progress from the ninth to the tenth move the 
median suddenly jumps from a value of about zero to one of about 100. We 
think this situation can be described by the word "breakdown." Moreover, 
it holds for any translation equivariant functional if one replaces the points 
carefully as in (6.2) and not as in (6.3). Any such functional must break 
down by the tenth move at the latest. 

We now consider the usual linear correlation functional Tic. For the initial 
data set of Figure 1 its value is —0.258. If we take any observation and move 
it to the point (7,7) and let 7 tend to infinity, then Tjc tends to 1. In 
this situation it seems reasonable to use the word breakdown but perhaps 
discontinuity would be a better description. We analyze the problem more 
closely. Linear correlation can be placed into the context of our paper by 
introducing the following group structure. We define Gic to be the group of 
transformations (7 : ^ with 

(9.18) g{x,y) = {aix + bi,a2y + b2) 

with 0102 7^ 0. An equivariant functional T is one which satisfies 

(9.19) r(P9)=sgn(aia2)T(P). 

Clearly the usual linear correlation functional Ty^ is equivariant w.r.t. this 
group. The metric on the space of distributions is taken to be the strip 
metric 

(9.20) dsr{P, Q) = sup{|P(C) - Q{C)\ : C G ST), 
where ST denotes the set of strips C 

(9.21) C = {{x,y):-5<ax + hy + c<5;a,h,ceM.,5eM.+ }. 

We note that this metric is also "correct" as it is invariant under the 
group G\c: 

(9.22) dsT{P,Q) = dsT{P'.Q'), g^Gic. 

There is also a version of this metric which corresponds to (5.5) [see Davies 
(1993)]. To fit into the structure in our paper we also require a metric D on 
the parameter space Q = [—1, 1]. The precise metric is not important because 
of the simple nature of the equivariance given in (9.19). To be concrete we 
put in (6.9) 

D*{ei,92) := |tan(ei7r/2) -tan(^27r/2)|. 
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which is consistent with the desire to have breakdown at ±1. 

Prom (9.19) we see that the condition Gi 7^ is not satisfied and The- 
orem 3.1 does not provide a nontrivial upper bound. Indeed, there is an 
equivariant correlation functional with breakdown point 1, namely = 0, 
but to forestall protests we give another. For an empirical distribution P„ 
we define 

(9.23) TUPn) = ^ Tic{I), 

i,\i\>3 

where / is a subset of the data containing |/| observations, N = 2"^ — n — 
n{n — l)/2 and T\c{I) is, by an abuse of notation, Tic evaluated at the em- 
pirical measure based on the set of observations in /. The functional Tj* 
is equivariant and also Fisher consistent. To calculate the breakdown point 
we consider an empirical measure Pn deriving from a sample of size n from 
a continuous distribution on and another empirical measure Qn- We 
assume that the supports of each are contained in some compact set K. 
The reason for these assumptions is to reduce complications due to the fact 
that the linear correlation coefficient as usually defined requires the exis- 
tence of moments. We consider a sequence of Qn with lim^^oo 7ic(Qn) = 1- 
From (9.23) it follows that the support of Qn must be contained in a strip 

Cn = {{X,y) ■■ -5n < anX + bnV + Cn < 6n} 

with lim„_^oo = 0. As Pn{Cn) < 2/n for sufficiently large n we have dsT{Pn-,Qn) > 
1 — 2/n and hence 

(9.24) e"(ri;, P„, dsr, {-1, !})>!- 2/n 

for this class of probability measures. We generalize this result in a manner 
which emulates the setting of our paper. As Gi is empty we reformulate the 
definition of the functional A(P) of (3.3) in our paper as follows. We set 

(9.25|\(P) = sup{P(B) :T(Q) not definable for Q with supp(g) C B}. 

For example, in the case of scale in M the relevant sets B are singletons 
and a measure concentrated on a singleton must have scale either zero or 00 
to be equivariant, both of which are excluded. If, following Grize, a linear 
correlation of ±1 is defined to be breakdown, the corresponding sets are 
lines and this leads to 

(9.26) A+(P) = sup{P(C) : C = {(x, y):ax + by + c = 0},ab < 0}, 

(9.27) A_(P) = sup{P(C) : C = {{x, y):ax + by + c = 0},ab > 0}, 
and it follows that 



(9.28) e** (rr„ P, dru, {-1, 1}) = 1 - min{A+(P), A_ (P)}. 
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The reasoning can be extended to rank correlation and this gives a more 
elegant theory as there are no problems with moments. The appropriate 
group is Grc which consists of all transformations (7 : — > of the form 

(9.29) gi{x,y)) = iax),rj{y)), C,r],R^R, 

where each of ^ and rj is either strictly increasing or strictly decreasing. 
A correlation functional Tj-c is equivariant with respect to this group if 

(9.30) rrc(P^) = sgn(Cor?)rrc(P), 

where sgn(^) = ±1 depending on whether ^ is strictly increasing or decreas- 
ing. The natural metric is the tube metric 

(9.31) druiP, Q) = sup{|P(C) - Q{C)\ : C e TU], 
where TU denotes the set of monotonic tubes C 

(9.32p = {(x,y): -(5 </i(a;) + y<(^,/i:M^M strictly monotonic, <5gM+}. 

The metric is "correct" in that it is invariant with respect to the group G^c'- 

(9.33) dTuiP,Q) = dru{P',Q'), g^G,,. 

As we now require correlations of ±1 only for data points which are strictly 
increasing or decreasing, we define analogously to (9.26) and (9.27), 

(9.34^+(i-') = sup{P(C) : C = {(x, y) : y = h{x)}, h strictly increasing}, 

(9.35|\_(P) =sup{P(C):C = {(x,y):y = /i(rE)},/i strictly decreasing}. 

From this it follows for Spearman's rank correlation functional Tsc that 

(9.36) e**{T,„P,dTu,{-hl}) = 1 - min{A+(P), A_(P)}. 

In fact (9.36) holds for any functional for which Trc(P) = 1 or —1 if and only 
if A+(P) = 1 or A_(P) = 1, respectively, and consequently it also holds for 
Kendall's r. The appearance of min in (9.28) and (9.36) is due to Grize's 
definition (9.16) of breakdown which refers to both boundary points. Us- 
able estimates of A+(P„) are available for empirical measures P„ deriving 
from nonatomic i.i.d. random variables in each component; that is, the com- 
ponents are also independent. Let the sample be (Xj,l^), i = 1, . . . ,n, and 
consider the points (Xji, ),..., (Xj^, 1^^) with the Xij, j = 1, . . . ,k, in 
increasing order. The points lie on some curve y = h{x) for a strictly in- 
creasing h if and only if the Yij, j = 1, . . . , fc, are also in increasing order. 
The probability of this is l/k\. There are (^) different samples of size k and 
we see that the probability that at least k points lie on some curve y = h{x) 
is at most 

}_(n\ ^ 
k\\k) - (fc!)2- 
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By maximizing over k we obtain 

A+(P0 = O(l/V^) 

and it follows from (9.36) 

(9.37) e**(rse,P„,drw,{-l, 1}) > 1 - 0{1/V^). 

The fact that (9.36) also holds for Kendall's r apparently contradicts Ham- 
pel's comments, but this is not so because it is definition (9.17) of break- 
down to which Hampel's comments apply. To proceed we consider the prob- 
lem of maximizing A_|_(i-*) subject to Trc{P) = 0. For Tgc the answer is 
A+(P) = \/l/2, which is attained at a distribution for which the rank of Xi 
is i and the rank oi yi is k + i, 1 < i < n — k, and n — i + 1, k + 1 < i < n, 
with k = n^l/2. The corresponding result for Kendall's r replaces ^1/2 
by \/l/2. If now Q is any distribution with A+{Q) = 1, it follows that the 
breakdown point [in the sense of (9.17)] at Q is 1 — \/l/2 = 0.2063 for Spear- 
man's rank correlation and 1 — ^1/2 = 0.2929 for Kendall's r. If we now 
move only half the mass of 1 — ^1/2, it is clear that we can obtain distribu- 
tions Qi and Q2 at which Spearman's and Kendall's rank correlations have 
breakdown points of (1 — ^1/2 )/2 and (1 — ^1/2 )/2, respectively, and that 
these are the smallest possible breakdown points. We have not understood 
Hampel's claim BP{K) = |5P(5) far as we can see, these refer to 

different distributions, one with A^(Qi) = 0.85 and one with A^(Q2) = 0.9, 
but this is only a minor point. The tube metric dq-u is stronger than the 
strip metric dsj- but considerably weaker than the total variation metric 
dtv used by Grize. In particular it allows for wobbling of the observations. 
We also note that neither metric suffers from the deficiency of the Prohorov 
metric of mixing dimensionless probabilities with measurement units. The 
class ST of strips has polynomial discrimination but not the class TU as 
there are arbitrarily large finite subsets of which can be shattered by TU 
[see Pollard (1984)]. 

Finally, we point out the differences to Theorem 3.1. If we take the usual 
definition of breakdown as a worst-case situation rather than Grize's defini- 
tion which is a sort of best worst-case definition, then the breakdown point 
of Kendall's or Spearman's rank correlation is 

(9.38) e*iT{„P,dru,{-hn) = 1 " A(P), 
where 

A(P) =max{A+(P),A_(P)}. 

In the general situation we argue as follows. Let Vq denote the set of dis- 
tributions at which T breaks down, which means that their support is con- 
tained in some exceptional subset of the sample space as in (9.25). Suppose 
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< A(P) < 1 and choose an exceptional subset Bq of the sample space 
with P{Bo) = a, < a < A(P). If we define Qo(-) = Pi' n Bo)/P{Bq) and 
Qi(.) = p\. r\{X\ Bq))/{1 - P{Bq)), then Qo and Qi are probability mea- 
sures with P = aQo + {1 — a)Qi. If the metric on V satisfies (2.2) of our 
paper, we see that d{P, Qo) < 1 — a and this implies 

(9.39) e*{T,P,d,D)<l- A{P). 

This differs from the claim of Theorem 3.1 by the factor of 1/2 and it is 
precisely the group structure which produces this factor. Because of equiv- 
ariance things start going wrong before one reaches an arbitrarily small 
neighborhood of some point in Vq. As Tyler mentions in his contribution, 
heuristic justifications for the factor of 1/2, such as not being able to distin- 
guish between good and bad data, are too vague. One of the challenges of 
this paper is to obtain the factor of 1/2 or even some other factor without 
an equi variance structure. 

10. Principal component vectors. Tyler argues that it may be possible 
to define a reasonable concept of breakdown for principal component vectors, 
although he recognizes that there are problems involved. The idea is that 
breakdown occurs if contamination results in the first principal component 
vector being orthogonal to the first principal component vector without 
contamination. This example cannot be reformulated in terms of metrics as 
described in Section 6, as there is no special set of parameter values Qq. 
Furthermore, it is not possible to adjust the proof of Theorem 3.1 to include 
this case. However, we shall now argue that it does not make sense to talk 
about the breakdown of the principal component vectors without reference 
to the corresponding eigenvalues. 

Consider a two-dimensional data set for which the eigenvalues are the 
same. The set of first principal component vectors is now the set of all 
points on the unit circle with 6 and —6 being identified. The smallest alter- 
ation of any observation will cause the space to collapse to a single direction, 
say 9i = (1,0), with the second principal component vector 02 = (0, 1) being 
orthogonal to it. It is clear that there exists an arbitrarily small perturba- 
tion of the original data set such that 6i = (0,1) and 6*2 = (1,0). In other 
words, there exist data sets for which arbitrarily small perturbations cause 
breakdown in the first principal component vector. The perturbations can 
be so small as to be nondetectable and any computer of finite precision or 
some nonoptimal numerical recipe may result in the wrong answer and be 
the "cause" of the breakdown. It seems to us that this situation is one which 
is not describable by the word "breakdown." We cannot think of any useful 
statistical procedure which can be made to break down by the smallest of 
perturbations of the data set. In practice, of course, use is not just made 
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of the first principal component vector but of all those principal compo- 
nent vectors for which the eigenvalues are in some sense large. One data 
analytical strategy is to look at the two-dimensional plots on the first two 
principal component vectors, and here it is irrelevant if they are the wrong 
way round. The principal component vectors are defined as those directions 
where the variability of the data, however measured, is particularly large. 
The word "breakdown" can be more appropriately applied to a situation in 
which the large variability is the result of outliers and causes a direction of 
small variability to become one of large variability. It seems to us to be clear 
that the size of the eigenvalues will have to be taken into account. Princi- 
pal component vectors do not therefore constitute a counterexample to our 
meta claim of no nontrivial theory of breakdown without groups. In spite 
of this Tyler has alerted us to the possibility of breakdown being defined in 
terms of a relationship between two parameter values rather than closeness 
to some specific parameter values. We cannot exclude the possibility of there 
being some perfectly reasonable concept of breakdown of this nature. 

11. Fisher consistency. Hampel does not like the example of regression 
through the origin and neither do we. It was included as an answer to a ref- 
eree as to whether it was possible to construct a Fisher consistent functional 
with a breakdown point of at least 1/3. Fisher consistency seems to be 
the obvious candidate to replace group equivariance as a desirable prop- 
erty of a functional. We do not give a theorem as there are difficulties in 
defining what is meant by a reasonable parametric family, but a paramet- 
ric family {P0,O G G} typically forms a very sparse subset of the set of all 
models. This is indicated by Figure 3 where the line represents the family 
of models in the space of all probability measures and the circles indicate 
an infinitesimal neighborhood. Fisher consistency describes the behavior of 
the functional only in the infinitesimal neighborhood. We are left free to 
define the functional elsewhere and this is what we exploit in our example. 
Equivariance considerations prevent this form of local definition. The orbits 
connect points which are far apart in the space of probability models and 
this prevents constructions such as the one we give. 

12. The samples (6.2) and (6.3). Hampel, Rousseeuw and Tyler all com- 
ment on the samples (6.2) and (6.3). Rousseeuw correctly remarks that one 
can often calculate the breakdown point of a functional directly and that 
such direct proofs do not rely on a repetition. Hampel says, also correctly, 
that the unnamed functional (there are many) must have a low breakdown 
and suggests that perhaps some small print is missing. What is missing is 
some large print explaining exactly what we intended with these two exam- 
ples. Tyler saw clearly what was intended and has made some very interest- 
ing comments on (6.2) and (6.3). He also explicitly mentions the connection 




Fig. 3. A thin parametric model and an infinitesimal neighborhood within which Fisher 
consistency becomes relevant. 



with the area of computer vision which was one of our motivations as we 
indicate below. He points out that such apparently well-understood func- 
tional such as appropriately tuned redescending M-functionals can exhibit 
the same behavior. We confess to not having been aware of this and we would 
have chosen another example had we known. As Tyler says, under appropri- 
ate conditions redescending affine equivariant M-functionals do not break 
down even under 99% contamination. This is exactly the phenomenon to 
which we intended to bring attention. 

The proof of Theorem 3.1 relies in part on exactly reproducing a portion 
of the data elsewhere. If there is no exact repetition as in (6.3), there will 
be many equivariant functionals which do not break down. One choice for 
sample (6.3) is 

(12.40) {Ti (x,), r,(x,)) = arg minj^ r(,) (^, af ] , 

f'" [j=i J 

where 

riifi, af = min - 

and zi = 1.5,2:2 = 1-8 and 23 = 1.3. In this connection we mention Oja's 
example of linear regression at the end of his Section 1. We fail to follow his 
argument as to why the estimate becomes uninformative. As it stands, the 
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argument seems to make no use of the assumption n = 2k, in which case we 
can put k = 1 and the conclusion would seem to be that every regression 
equivariant functional has a breakdown point of 1/n. If n = 2k is implicitly 
meant, then breakdown occurs only if we cannot distinguish between the 
two samples. If we can distinguish between the two samples, for his example 
li xi = X2 = ■ ■ ■ = Xk = 0, then what is claimed as breakdown is nothing but 
equivariance (see Section 2 above). 

At first glance the functional (12.40) may seem very artificial but this is 
not so. It is constructed to find a particular pattern in the sample, namely 
affine transformations of 1.5, 1.8 and 1.3. Figure 4 shows the smile of the 
Cheshire cat and the problem is to locate it in a sea of noise into which it 
gradually disappears. This is only possible as the noise does not reproduce 
the signal. For real examples from the area of computer vision we refer 
to Wang and Suter (2004). In analytical terms one can define a modified 
breakdown point by 

(12.41) e* (T, P, d, n) = inf {e > : sup \T{P) - T{Q) \,d{P,Q) < e,Q eH}, 

where TC specifies what you want to protect yourself against. If 7i does not 
allow a repetition of the signal elsewhere, then affine equivariant functionals 
can attain breakdown points higher than 1/2. Moreover, the usual high 
breakdown functionals are typically of no help in this situation [see Wang 
and Suter (2004)]. 





Fig. 4. The smile of the Cheshire cat gradually disappearing as it is corrupted by noise. 
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13. Nonparametric statistics. In this paper we have shown that the con- 
cept of breakdown point has been generally accepted only in situations where 
there is a group structure sufficiently rich to allow the calculation of a non- 
trivial upper bound for the breakdown point. In his contribution Hampel 
speculates that this could be the reason why the breakdown point for corre- 
lation coefficients has not yet been widely accepted. In spite of this and as 
pointed out by an Associate Editor, we have not proved a theorem to the 
effect that a breakdown point is only sensible when a rich group structure 
exists. It is difficult to imagine what such a theorem would look like. Nev- 
ertheless the paper, the contributions of the discussants and our reply do 
seem to indicate that it will not be easy to come to an acceptable defini- 
tion of breakdown with a nontrivial upper bound without a group structure. 
There is perhaps another reason why some definitions of breakdown have 
been successful. They are defined for so-called nonparametric functionals in 
the sense of Bickel and Lehmann (1975a, b). One can always calculate the 
median of a distribution in M and this is not associated with a restrictive 
stochastic model. We wish to emphasize this as we have the impression that 
it is sometimes assumed that functionals are only to be applied to data which 
is generated by some stochastic model but with contamination. Genton and 
Lucas entitle a section "Breakdown point for (in)dependent observations," 
which suggests at least to us that they distinguish between samples which 
are generated by independent random variables and those which are not. 
The title of Genton and Lucas (2003) also tends in this direction. They 
write "3^ is the set of all allowable samples" and later "3^ is the set of all 
stationary AR(1) processes." We think the intention is clear. The data are 
generated by a stationary AR(1) process and then contaminated by the out- 
liers. On the other hand, the only possible mathematical interpretation of 
"the set of all stationary AR(1) processes" is the support of the model. As 
the support of an AR(1) process with Gaussian innovations is ffi", this means 
simply all samples, 3^ = M". Thus what at first glance seems plausible turns 
out to be untenable. This is the reason why the restrictions we place on data 
are analytical ones and not distributional ones. The median can be success- 
fully applied to data which are very obviously dependent [see, e.g., Davies, 
Fried and Gather (2004)], but consider the data in Figure 5. For these data 
it makes no sense to artificially restrict p to the interval [—1, 1]. Rather one 
would point out that the data can be well approximated by an AR(1) model 
with p= —1.25 but not by a stationary AR(1) model. This simple message 
is lost if we are forced to specify a value in [—1, 1]. If breakdown is going to 
be meaningful in such a situation we suspect that it should be applied to a 
statistical procedure and not to the behavior of a single functional. 

14. Breakdown without groups and alternatives. We have argued above 
that the only generally accepted definitions of breakdown are in situations 
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Fig. 5. A sample of size 14 generated by Xt+i — —1.25Xt +0.2Z{t) with Z{t) standard 
Gaussian white noise. 



where there is a sufficiently rich group and equivariance structure. If a need 
is felt to extend it to other situations, we state what we think are the min- 
imal requirements. First, the definition should be capable of being made 
precise. He argues that breakdown is the smallest fraction of contamination 
which makes a test statistic "uninformative or unusable." Later he argues 
that breakdown should have the same degree of vagueness as he claims to 
be the case with outliers. He continues that "when every statistician starts 
to talk about his or her own notion of a breakdown point, I think we have 
made it." We think there are dangers in such an attitude. Ostensive defini- 
tions of breakdown with statisticians pointing in all directions are unlikely 
to contribute to a general acceptance of the word. Intuition is important, 
but just as is the case with outliers [see Davies and Gather (1993)] much 
is to be gained by undertaking the attempt to give a precise definition and 
to investigate its consequences. This not only deepens the understanding, it 
also sharpens the intuition. Semantics is important and we think that any 
generalizations of the concept of breakdown should be such as to be recogniz- 
ably referring to some common element, in particular the presence of some 
natural pole. Second, it is not sufficient to give a definition of breakdown 
and show that it gives the correct answer in some particular cases. A defini- 
tion of breakdown should be subjected to some form of analysis, including 
its stability under perturbations. The onus is on those who propose defini- 
tions of breakdown to do this. Third, the definition should be simple and 
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intuitively appealing. Here we agree completely with He. If it requires more 
than sixty seconds to understand a definition, it is probably bad. Fourth, 
when calculating breakdown points use should be made of metrics which 
can accommodate rounding errors. Gross error neighborhoods and the to- 
tal variation metric are too restrictive. Fifth, the definition should not be 
too restrictive and only apply to one single functional. It should apply to a 
whole family of functionals which offer different possibilities of quantifying 
the feature of interest, location, scale, correlation or nonparametric function. 
Sixth, there should be a class of reasonable functionals for which it makes 
sense to compare breakdown points. If such a definition of breakdown is not 
possible, there are alternatives. One is simply to compare different function- 
als by their continuity or bias properties, again if possible in weak metrics. 
For this to make sense it is not necessary that an explosion occurs. It may 
be that this proves more useful than trying to extend the idea of breakdown 
to situations for which it is not suitable. 

15. Conclusion. We thank all discussants for their contributions and 
hope that the disagreements that are apparent have been clarified by our 
rejoinder. In our paper we have not proved that breakdown without equiv- 
ariance is not a sensible concept. On the other hand, in all situations we 
are aware of in which there is no or little equivariance (made precise by 
our main theorem), then either (i) breakdown points of 1 are attainable or 
(ii) the word breakdown is inappropriate (the movement from the top right 
to the bottom right panel of Figure 2) or (iii) the very definition of break- 
down point is inadequate. An example without these weaknesses would be 
interesting. 
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